Similarity Measures in Documents Using Association Graphs
نویسندگان
چکیده
In this paper we present a new model, designated as Association Graph, to improve document representation, facilitating the ontological dimension. We explain how to generate and use this kind of graph. Also, we analyze different document similarity measures based on this representation. A classical vector space model was used to evaluate this model and measures, investigating their strengths and weaknesses. The proposed model was found to give promising results.
منابع مشابه
Measuring the Structural Similarity of Web-based Documents: A Novel Approach
Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so c...
متن کاملMicrosoft Word - CONTENTS-AUGUST07
Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so c...
متن کاملبررسی قابلیت بهکارگیری سنجه های مرکزیت به عنوان شاخصهای ارتباط استنادی مدارک در بازیابی اطلاعات رابطه ای: مطالعۀ مقدماتی
Purpose: this is a pilot study tends to investigate correlation between centrality measures with bibliographic coupling as a well-known citation-based document similarity measure. Methodology: using citation analysis method, 40 research articles belonging to four engineering/pure disciplines (Physics, Chemistry, Biology, and computer) and four Humanities and Social disciplines (Economics, Edu...
متن کاملطراحی سامانۀ تشخیص دستبرد ادبی جملهبنیاد در متون فارسی به کمک همجوشی گواهها
Today, there are many documents on Internet, such that users can generate new documents by coping them and existing Plagiarism Detection systems (PDS) couldn't detect all kind of plagiarism. The main challenge is finding a suitable algorithm to improving the amount of similar documents and their assessing time. It’s difficult to do assessing similarity in Persian texts that different characteri...
متن کاملA Graphical Framework For Contextual Search And Name Disambiguation In Email
Similarity measures for text have historically been an important tool for solving information retrieval problems. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a struct...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005